Ground-truth transcriptions of real music from force-aligned MIDI syntheses
نویسندگان
چکیده
Many modern polyphonic music transcription algorithms are presented in a statistical pattern recognition framework. But without a large corpus of real-world music transcribed at the note level, these algorithms are unable to take advantage of supervised learning methods and also have difficulty reporting a quantitative metric of their performance, such as a Note Error Rate. We attempt to remedy this situation by taking advantage of publicly-available MIDI transcriptions. By force-aligning synthesized audio generated from a MIDI transcription with the raw audio of the song it represents we can correlate note events within the MIDI data with the precise time in the raw audio where that note is likely to be expressed. Having these alignments will support the creation of a polyphonic transcription system based on labeled segments of produced music. But because the MIDI transcriptions we find are of variable quality, an integral step in the process is automatically evaluating the integrity of the alignment before using the transcription as part of any training set of labeled examples. Comparing a library of 40 published songs to freely available MIDI files, we were able to align 31 (78%). We are building a collection of over 500 MIDI transcriptions matching songs in our commercial music collection, for a potential total of 35 hours of notelevel transcriptions, or some 1.5 million note events.
منابع مشابه
Synthesized Polyphonic Music Database with Verifiable Ground Truth for Multiple F0 Estimation
To study and to evaluate a multiple F0 estimation algorithm, a polyphonic database with verifiable ground truth is necessary. Real recordings with manual annotation as ground truth are often used for evaluation. However, ambiguities arise during manual annotation, which are often set up by subjective judgements. Therefore, in order to have access to verifiable ground truth, we propose a systema...
متن کاملExtracting Ground-Truth Information from MIDI Files: A MIDIfesto
MIDI files abound and provide a bounty of information for music informatics. We enumerate the types of information available in MIDI files and describe the steps necessary for utilizing them. We also quantify the reliability of this data by comparing it to human-annotated ground truth. The results suggest that developing better methods to leverage information present in MIDI files will facilita...
متن کاملAn Expert Ground Truth Set for Audio Chord Recognition and Music Analysis
Audio chord recognition has attracted much interest in recent years, but a severe lack of reliable training data—both in terms of quantity and range of sampling—has hindered progress. Working with a team of trained jazz musicians, we have collected time-aligned transcriptions of the harmony in more than a thousand songs selected randomly from the Billboard “Hot 100” chart in the United States b...
متن کاملPhrase-Level Audio Segmentation of Jazz Improvisations Informed by Symbolic Data
Computational music structure analysis encompasses any model attempting to organize music into qualitatively salient structural units, which can include anything in the heirarchy of large scale form, down to individual phrases and notes. While much existing audio-based segmentation work attempts to capture repetition and homogeneity cues useful at the form and thematic level, the time scales in...
متن کاملLarge-Scale Content-Based Matching of MIDI and Audio Files
MIDI files, when paired with corresponding audio recordings, can be used as ground truth for many music information retrieval tasks. We present a system which can efficiently match and align MIDI files to entries in a large corpus of audio content based solely on content, i.e., without using any metadata. The core of our approach is a convolutional network-based cross-modality hashing scheme wh...
متن کامل